AITopics

Country:

South America > Peru (0.04)
South America > Colombia (0.04)
North America > Mexico (0.04)
(4 more...)

Genre: Research Report > Experimental Study (0.93)

Industry:

Health & Medicine > Consumer Health (1.00)
Education (0.93)
Government (0.92)
Health & Medicine > Therapeutic Area (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Neural Information Processing SystemsFeb-8-2026, 10:06:54 GMT

TrainingLinearFinite-StateMachines

A finite-state machine (FSM) is a computation model to process binary strings in sequential circuits.

artificial intelligence, machine learning, natural language, (19 more...)

Country:

North America > United States > Massachusetts > Suffolk County > Boston (0.04)
North America > Canada > Quebec > Montreal (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Asia > Middle East > Qatar > Ad-Dawhah > Doha (0.04)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

Xia, Eric, Klusowski, Jason M.

Classification Imbalance as Transfer Learning

arXiv.org Machine LearningJan-16-2026

Classification imbalance arises when one class is much rarer than the other. We frame this setting as transfer learning under label (prior) shift between an imbalanced source distribution induced by the observed data and a balanced target distribution under which performance is evaluated. Within this framework, we study a family of oversampling procedures that augment the training data by generating synthetic samples from an estimated minority-class distribution to roughly balance the classes, among which the celebrated SMOTE algorithm is a canonical example. We show that the excess risk decomposes into the rate achievable under balanced training (as if the data had been drawn from the balanced target distribution) and an additional term, the cost of transfer, which quantifies the discrepancy between the estimated and true minority-class distributions. In particular, we show that the cost of transfer for SMOTE dominates that of bootstrapping (random oversampling) in moderately high dimensions, suggesting that we should expect bootstrapping to have better performance than SMOTE in general. We corroborate these findings with experimental evidence. More broadly, our results provide guidance for choosing among augmentation strategies for imbalanced classification.

artificial intelligence, machine learning, probability, (16 more...)

2601.1063

Country: North America > United States (0.27)

Genre: Research Report > New Finding (0.48)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.61)

Ki, Dohyeong, Guntuboyina, Adityanand

What Functions Does XGBoost Learn?

arXiv.org Machine LearningJan-12-2026

This paper establishes a rigorous theoretical foundation for the function class implicitly learned by XGBoost, bridging the gap between its empirical success and our theoretical understanding. We introduce an infinite-dimensional function class $\mathcal{F}^{d, s}_{\infty-\text{ST}}$ that extends finite ensembles of bounded-depth regression trees, together with a complexity measure $V^{d, s}_{\infty-\text{XGB}}(\cdot)$ that generalizes the $L^1$ regularization penalty used in XGBoost. We show that every optimizer of the XGBoost objective is also an optimizer of an equivalent penalized regression problem over $\mathcal{F}^{d, s}_{\infty-\text{ST}}$ with penalty $V^{d, s}_{\infty-\text{XGB}}(\cdot)$, providing an interpretation of XGBoost as implicitly targeting a broader function class. We also develop a smoothness-based interpretation of $\mathcal{F}^{d, s}_{\infty-\text{ST}}$ and $V^{d, s}_{\infty-\text{XGB}}(\cdot)$ in terms of Hardy--Krause variation. We prove that the least squares estimator over $\{f \in \mathcal{F}^{d, s}_{\infty-\text{ST}}: V^{d, s}_{\infty-\text{XGB}}(f) \le V\}$ achieves a nearly minimax-optimal rate of convergence $n^{-2/3} (\log n)^{4(\min(s, d) - 1)/3}$, thereby avoiding the curse of dimensionality. Our results provide the first rigorous characterization of the function space underlying XGBoost, clarify its connection to classical notions of variation, and identify an important open problem: whether the XGBoost algorithm itself achieves minimax optimality over this class.

artificial intelligence, machine learning, variation, (19 more...)

2601.05444

Country: North America > United States (0.92)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)

Herold, Martin G., Kipouridis, Evangelos, Spoerhase, Joachim

A Broader View on Clustering under Cluster-Aware Norm Objectives

arXiv.org Artificial IntelligenceDec-10-2025

We revisit the $(f,g)$-clustering problem that we introduced in a recent work [SODA'25], and which subsumes fundamental clustering problems such as $k$-Center, $k$-Median, Min-Sum of Radii, and Min-Load $k$-Clustering. This problem assigns each of the $k$ clusters a cost determined by the monotone, symmetric norm $f$ applied to the vector distances in the cluster, and aims at minimizing the norm $g$ applied to the vector of cluster costs. Previously, we focused on certain special cases for which we designed constant-factor approximation algorithms. Our bounds for more general settings left, however, large gaps to the known bounds for the basic problems they capture. In this work, we provide a clearer picture of the approximability of these more general settings. First, we design an $O(\log^2 n)$-approximation algorithm for $(f, L_{1})$-clustering for any $f$. This improves upon our previous $\widetilde{O}(\sqrt{n})$-approximation. Second, we provide an $O(k)$-approximation for the general $(f,g)$-clustering problem, which improves upon our previous $\widetilde{O}(\sqrt{kn})$-approximation algorithm and matches the best-known upper bound for Min-Load $k$-Clustering. We then design an approximation algorithm for $(f,g)$-clustering that interpolates, up to polylog factors, between the best known bounds for $k$-Center, $k$-Median, Min-Sum of Radii, Min-Load $k$-Clustering, (Top, $L_{1}$)-clustering, and $(L_{\infty},g)$-clustering based on a newly defined parameter of $f$ and $g$.

artificial intelligence, machine learning, px 1, (16 more...)

arXiv.org Artificial Intelligence

2512.06211

Country: Europe > United Kingdom (0.27)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Neufeld, Ariel, Schmocker, Philipp, Tran, Viet Khoa

Approximation rates of quantum neural networks for periodic functions via Jackson's inequality

arXiv.org Machine LearningNov-21-2025

Quantum neural networks (QNNs) are an analog of classical neural networks in the world of quantum computing, which are represented by a unitary matrix with trainable parameters. Inspired by the universal approximation property of classical neural networks, ensuring that every continuous function can be arbitrarily well approximated uniformly on a compact set of a Euclidean space, some recent works have established analogous results for QNNs, ranging from single-qubit to multi-qubit QNNs, and even hybrid classical-quantum models. In this paper, we study the approximation capabilities of QNNs for periodic functions with respect to the supremum norm. We use the Jackson inequality to approximate a given function by implementing its approximating trigonometric polynomial via a suitable QNN. In particular, we see that by restricting to the class of periodic functions, one can achieve a quadratic reduction of the number of parameters, producing better approximation results than in the literature. Moreover, the smoother the function, the fewer parameters are needed to construct a QNN to approximate the function.

artificial intelligence, machine learning, quantum neural network, (15 more...)

2511.16149

Country: North America > United States (0.93)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Neural Information Processing SystemsNov-20-2025, 06:03:35 GMT

A Theoretical Perspective for Speculative Decoding Algorithm Ming Yin

Transformer-based autoregressive sampling has been the major bottleneck for slowing down large language model inferences.

large language model, machine learning, natural language, (17 more...)

Genre: Research Report > Experimental Study (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Francobaldi, Matteo, Lombardi, Michele, Lodi, Andrea

SMiLE: Provably Enforcing Global Relational Properties in Neural Networks

arXiv.org Artificial IntelligenceNov-11-2025

Artificial Intelligence systems are increasingly deployed in settings where ensuring robustness, fairness, or domain-specific properties is essential for regulation compliance and alignment with human values. However, especially on Neural Networks, property enforcement is very challenging, and existing methods are limited to specific constraints or local properties (defined around datapoints), or fail to provide full guarantees. We tackle these limitations by extending SMiLE, a recently proposed enforcement framework for NNs, to support global relational properties (defined over the entire input space). The proposed approach scales well with model complexity, accommodates general properties and backbones, and provides full satisfaction guarantees. We evaluate SMiLE on monotonicity, global robustness, and individual fairness, on synthetic and real data, for regression and classification tasks. Our approach is competitive with property-specific baselines in terms of accuracy and runtime, and strictly superior in terms of generality and level of guarantees. Overall, our results emphasize the potential of the SMiLE framework as a platform for future research and applications.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2511.07208

Country: Europe (0.67)

Genre: Research Report > New Finding (0.66)

Industry: Law (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Neural Information Processing SystemsOct-10-2025, 19:59:47 GMT

A Theoretical Perspective for Speculative Decoding Algorithm Ming Yin

Transformer-based autoregressive sampling has been the major bottleneck for slowing down large language model inferences.

algorithm, arxiv preprint arxiv, rejection, (13 more...)

Genre: Research Report > Experimental Study (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Catanzariti, Magaly, Aimar, Hugo, Mateos, Diego M.

Divergence Phase Index: A Riesz-Transform Framework for Multidimensional Phase Difference Analysis

arXiv.org Machine LearningOct-7-2025

We introduce the Divergence Phase Index (DPI), a novel framework for quantifying phase differences in one and multidimensional signals, grounded in harmonic analysis via the Riesz transform. Based on classical Hilbert Transform phase measures, the DPI extends these principles to higher dimensions, offering a geometry-aware metric that is invariant to intensity scaling and sensitive to structural changes. We applied this method on both synthetic and real-world datasets, including intracranial EEG (iEEG) recordings during epileptic seizures, high-resolution microscopy images, and paintings. In the 1D case, the DPI robustly detects hypersynchronization associated with generalized epilepsy, while in 2D, it reveals subtle, imperceptible changes in images and artworks. Additionally, it can detect rotational variations in highly isotropic microscopy images. The DPI's robustness to amplitude variations and its adaptability across domains enable its use in diverse applications from nonlinear dynamics, complex systems analysis, to multidimensional signal processing.

divergence phase index, phase difference, riesz transform, (14 more...)

2510.04426

Country:

South America > Argentina (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(3 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology > Epilepsy (0.87)
Health & Medicine > Therapeutic Area > Genetic Disease (0.87)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (0.94)
Information Technology > Artificial Intelligence (0.88)